ensemble PCA

Nick McKay

2020-07-02

library(lipdR)
library(geoChronR)
library(magrittr)
library(dplyr)
library(purrr)

This vignetee showcases the ability to perform principal component analysis (PCA, also known as empirical orthogonal function (EOF) analysis. Data are from the Arctic 2k compilation, which we load here:

FD <- readLipd("http://lipdverse.org/geoChronR-examples/arc2k/Arctic2k.zip") 
## [1] "reading: Arc-Agassiz.Vinther.2008.lpd"
## [1] "reading: Arc-Austfonna.Isaksson.2005.lpd"
## [1] "reading: Arc-CampCentury.Fisher.1969.lpd"
## [1] "reading: Arc-Crete.Vinther.2010.lpd"
## [1] "reading: Arc-DevonIceCap.Fisher.1983.lpd"
## [1] "reading: Arc-Dye.Vinther.2010.lpd"
## [1] "reading: Arc-GISP2.Grootes.1997.lpd"
## [1] "reading: Arc-GRIP.Vinther.2010.lpd"
## [1] "reading: Arc-Hvtrvatn.Larsen.2011.lpd"
## [1] "reading: Arc-LakeC2.Lamoureux.1996.lpd"
## [1] "reading: Arc-LakeDonardBaffinIsland.Moore.2001.lpd"
## [1] "reading: Arc-LakeLehmilampi.Haltia-Hovi.2007.lpd"
## [1] "reading: Arc-LakeNataujrvi.Ojala.2005.lpd"
## [1] "reading: Arc-LowerMurrayLake.Cook.2008.lpd"
## [1] "reading: Arc-NGRIP1.Vinther.2006.lpd"
## [1] "reading: Arc-NGTB16.Schwager.1998.lpd"
## [1] "reading: Arc-NGTB18.Schwager.1998.lpd"
## [1] "reading: Arc-NGTB21.Schwager.1998.lpd"
## [1] "reading: Arc-Renland.Vinther.2008.lpd"

##Make a map

mapLipd(FD,map.type = "line",projection = "stereo",f = 0.1)

More map projections are available too. A list is available here: ?mapproject

##Grab the age ensembles for each record. We need to “map”" the age ensembles to paleo for all of these datasets. We’ll use purrr::map for this, but you could also do it with sapply(). In this case we’re going to specify that all of the age ensembles are named “ageEnsemble”, and that they don’t have a depth variable because they’re layer counted.

FD2 = purrr::map(FD,
                 mapAgeEnsembleToPaleoData,
                 strict.search = TRUE,
                 age.var = "ageEnsemble",
                 depth.var = NULL )

Now extract all the “timeseries” into at “TS object” that will facilitate working with multiple records.

TS = extractTs(FD2)

and filter the TS object to only include variables that have been interepreted as temperature:

TS.filtered = filterTs(TS,"interpretation1_variable == T")

OK, let’s make a quick plot stack to see what we’re dealing with.

tidyDf <- tidyTs(TS.filtered)
plotTimeseriesStack(tidyDf, 
                    color.var = "paleoData_variableName", 
                    color.ramp = c("DarkBlue","Orange","Black","Dark Green"),
                    line.size = .1, 
                    fill.alpha = .05,
                    lab.size = 2,
                    lab.space = 3)

Now bin all the data in the TS from 1400 to 2000, an interval of pretty good data coverage, into 5 year bins.

binned.TS = binTs(TS.filtered,bin.vec = seq(1400,2000,by=5),time.var = "ageEnsemble")

and calculate PCA on each ensemble member:

pcout = pcaEns(binned.TS)

OK! Let’s plot the results.

plotPCA = plotPcaEns(pcout,TS = TS.filtered,map.type = "line",projection = "stereo",bound.circ = T,restrict.map.range = T,f=.1,legend.position = c(0.5,.6),which.pcs = 1:2,which.leg = 2)

Nice! A summary plot that combines the major features is produced, but all of the components, are included in the “plotPCA” list that was exported.

Here’s the first map

plotPCA$maps[[1]]

The second timeseries

plotPCA$lines[[2]]

A

plotPCA$sampleDepth

this time - grab only those that are d18O, and use a covariance matrix let’s look at all the names in the TS

var.names <- pullTsVariable(TS, "variableName")

Oops - looks like we didn’t use quite the correct name. Next time use:

var.names <- pullTsVariable(TS, "paleoData_variableName")

and take a look at the unique variableNames in the TS

unique(var.names)
## [1] "d18O"                    "year"                   
## [3] "ageEnsemble"             "thickness"              
## [5] "X_radiograph_dark_layer" "massacum"

OK. Let’s filter the timeseries again, this time pulling all the d18O data.

d18OTS = filterTs(TS,"paleoData_variableName == d18O")
tidyd18O <- tidyTs(d18OTS)
#arrange the tidy dataframe by record length
tidyd18O <- tidyd18O %>% 
  group_by(paleoData_TSid) %>% 
  mutate(range = max(year) - min(year)) %>% 
  arrange(range)
plotTimeseriesStack(tidyd18O, 
                    color.var = "paleoData_variableName", 
                    color.ramp = c("DarkBlue"),
                    line.size = .1, 
                    fill.alpha = .05,
                    lab.size = 2,
                    lab.space = 2,
                    lab.buff = 0.03)

Now, we’ll bin again.

binned.TS2 = binTs(d18OTS,bin.vec = seq(1400,2000,by=5),na.col.rm = T)

And calculate the ensemble PCA, this time using a covariance matrix

pcout2 = pcaEns(binned.TS2,pca.type = "cov")
plotPCA2 = plotPcaEns(pcout2,TS = d18OTS,map.type = "line",projection = "stereo",bound.circ = T,restrict.map.range = T,f=.2)